Re: UTF-8 question.

Поиск
Список
Период
Сортировка
От Dan Sugalski
Тема Re: UTF-8 question.
Дата
Msg-id a06110407bd6fe9f9673a@[192.168.1.105]
обсуждение исходный текст
Ответ на UTF-8 question.  ("Richard Connamacher" <rich.n1@indieimage.com>)
Список pgsql-general
At 8:39 PM -0400 9/16/04, Richard Connamacher wrote:
>I'm new to PostgreSQL, and from the looks of it, it's a great database,
>and I'll be using more of it in the future.
>
>I had a quick question if anyone could clear this up. The documentation
>for PostgreSQL (version 7.1, the version this server is using) says that
>it supports multibyte character encodings like Unicode (which implies
>UTF-16 encoding).

Don't confuse Unicode, the 'character set' and rules for characters,
represented by a sequence of abstract 32 bit integers, with
UTF-[8|16|32] which is a way to encode those abstract integers into a
stream of bytes someplace.

>  Later on, the same page says that Unicode is
>represented using UTF-8 encoding. UTF-8 is the 8-bit version of Unicode.
>The multibyte version of Unicode is UTF-16.
>
>So, which is it? If I create a database using Unicode as the encoding,
>will the encoding be UTF-8 (singlebyte) or UTF-16 (multibyte)?

Erm... UTF-8 *is* a multibyte encoding. Up to 6 bytes per code point,
if things get really degenerate. (And, last I checked, means you can
have up to 70 bytes for really degenerate characters, but my memory
might be off (could be 80))

UTF-8, UTF-16, and UTF-32 will all encode Unicode characters just fine.
--
                Dan

--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
dan@sidhe.org                         have teddy bears and even
                                       teddy bears get drunk

В списке pgsql-general по дате отправления:

Предыдущее
От: "Richard Connamacher"
Дата:
Сообщение: UTF-8 question.
Следующее
От: Michael Glaesemann
Дата:
Сообщение: Re: UTF-8 question.